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EVALUATING THE OBJECTIVES IN 
FOREIGN-LANGUAGE TEACHING *) 



Rebecca M. Valette 



L’enseignement moderne des langues etrangeres met I’accent sur la comprehen- 
sion et la faculty d’exprimer la langue parl^e aussi bien qu’ecrite. Cette modifica- 
tion du but a atteindre a entrainc la creation de nouvelles formes de test et d’examen 
destines a permettre une evaluation rigoureusement objective. Maij il ne suffit pas 
d’evaluer globalement les connaissances de I’eleve en matiere de vocabulaire, 
morphologie et syntaxe; il faut pouvoir ^valuer le degre relatif d’habilete de I’deve 
a comprendre, parler, lire et ecrire la langue etrangere. L’auteur examine diverses 
methodes d’ evaluation, soit pures, soit mixtes, qui seraient specifiques a chacun 
de CCS domaines et esquisse a la fin quelques directions de recherche et de 
developpement futurs. 



Im modernen Fremdsprachenunterricht wird besonderes Gewicht gelegt auf 
das Verstehen und die Fahigkeit, die gesprochene Sprache so gut zu beherrschen 
wie die geschriebene. Diese Zielsetzung hatte die Erarbeitung neuer Tests und Prii- 
fungsmethoden zur Folge, die eine moglichst objektive Leistungsbeurteilung er- 
moglichen sollten. Es ist jedoch nicht ausreichend, das Wissen des Schulers in 
Wortschatz, Morphologie und Syntax als Ganzes zu beurteilen. Vielmehr mufi es 
mbglich sein, beim einzelnen Schuler den relativen Grad dessen herausfinden zu 
konnen, wie gut er die Fremdsprache versteht, spricht, liest und schreibt Vf. unter- 
sucht verschiedene Beurteilungsmethoden, reine und gemischte Formen, die fur 
jeden dieser Bereiche speziell geeignet sind und skizziert abschliefiend einige An- 
regungen fur die Forschung und zukiinftige Entwicklung. 

Spoken langt "ge is a phenomenon exceedingly more complex than its 
graphic representation, the printed word. However, since the study of literature, 
which for centuries represented the educational goals of the elite, necessitated 
the acquisition of a reading knowledge of foreign languages, language and 
literature were formally equated in the curriculum. In restricting their concept 
of language, educators disregarded the idea of language as oral communication 
and chose to ignore the fact that throughout history conquered peoples, immi- 
grants and travellers did adapt to a new linguistic environment. Some people 
acquired a limited vocabulary, which they modeled according to the grammatical 
patterns and sound system of their native language. Others achieved proper 
fluency — in varying degrees, of course. In his new surroundings each individual 
speaking the language evaluated his own progress in terms of his success in 
communication. On the other hand, in the schools the scoring of a student’s 



*) Paper read June 28, 1965, in the American Educational Research Association 
Symposium at the annual meeting of the National Education Association. 
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ability to listen and to speak was almost totally neglected. Written classroom 
tests measured the student’s grasp of literature. 

Now that language teacher have been broadening their aims and developing 
a curriculum destined not only to teach the core of the language, that is, its 
words and structures, but also to build up student proficiency in the skills of 
listening and speaking as well as of reading and writing, new evaluation 
techniques are being introduced. As in any relarively unexplored field, the 
initial advances appear spectacular. Wiile in no way detracting from the 
accomplishments of recent years, I propose to examine the various current 

methods of measuring the language skills and to suggest areas for farther 
research. 

Let us first look briefly at language itself. As you may remember, Alice in 

Wonderland in her journey through the Looking Glass was intrigued with 
Jabberwocky: 



’Twas brilling, and the slithy toves 
Did gyre and gimble in the wabe; 

All mimsy were the borogoves. 

And the mome raths outgrabe. 

The grammatical structure is English: “Slithy” is an adjective modtfying “toves” 
(the plural of “tove”); “gyre” and “gimble” are verbs, as is obvious from their 
position relative to “did”; “outgrabe” is the past tense of “outgribe”. Alice’s 
comprehension problem was uniquely one of vocabulary. 

The recendy developed modern 1-uiguage curriculum known as the New Key 
consiste of audio-lingual materials wb’ch, in Levels I and II, emphasize the 
acquisition of structure rather than rapid vocabulary development. After all, 
the structure of the language indicates the relationships among the parts of the 
sentence. A single unfamiliar word can be looked up in a dictionary, but a new 
construction poses a dilemma. 

Both foreign-language usage and vocabulary adapt themselves well to objec- 
tive measurement, much as they do in English. The first standard'zed tests, 
established in the 1920’s, concentrated primarily on these two fundamental 
aspects of the foreign language; as for English, printed multiple-choice tests 
proved most reliable and most economical in terms of scoring time. 

The new curriculum materials have been devised on the assumption that 
foreign-language learning is basically a mechanical process of habit formation. 
Students learn a natural dialogue by ear. The words and structures of that dia- 
logue are presented in a variety of ways, primarily through pattern drills and 
directed dialogues. Before advancing to the next lesson, the student should be 
so familiar with the material just covered that he can speak itfluendy and cor- 
rectly. Since New Key proponents define language primarily as “behavior”, the 
material is so structured that the student is induced to speak or behave without 
making mistakes. The entire lesson, with the exception of perhaps a few minutes 
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at either the beginning or the end, is conducted in the foreign language. Home- 
woik at the elementary level consists in language laboratory sessions or listening 
at home to tapes or discs recorded by native speakers. Once the student is at ease 
with the spoken language, he is introduced to the skills of reading and writing, 
again in such a systematic way as almost to eliminate the possibility of error. 

Within the New Key framework, tests conform to rigid qualifications. No 
wrong forms are employed; consequendy, on a multiple-choice test all choices 
are idiomatic and properly spelled but only one choice constitutes an appropriate 
answer. Since the foreign language is taught as a system of communication 
with the least possible reference to English, test items appear entirely in the 
foreign language. Mixed sentences, pardy in English and pardy in the target 
language, are eliminated as inconsistent with the aims. If an English equivalent 
must be included to clarify the meaning of the item, then two endre sentences 
should be used. Thus, whereas homework sentences or test items once read : 

Je n'aime pas cette robe-ci, je prefire — (th ^ ' one) 
such questions now appear as 

I don’t like this dress, I prefer that one. 

Je n'aime pas cette robe-ci, je prefire — . 

In these two sentences the student sees that there is no one-to-one word-to-word 
correspondence between " jc n'aime pas" and ‘ I don’t like.” Thus he has less 
tendency to look for a French word for “that” and another for “one” but is 
encouraged to find semantically equivalent structures in the two languages. 
Finally, New Key examinations include only natural constructions. Contrived 
sentences containing pitfalls which would stump even the native speaker have 
been eliminated. 

However, the measurement of achievement in language learning must not 
stop here. Admittedly, the word, the grammatical element and the syntactic 
pattern are the building blocks of the sentence. Without them one can neither 
understand nor speak, neither read nor write the language. But since the New 
Key curriculum emphasizes the acquisition of all four language skills, 
standardized tests are needed to evaluate the student’s relative proficiency in 
listening, speaking, reading and writing. 

A language is not like football, for example. The coach can measure 
independently the player’s knowledge of the vocabulary and the structure of the 
game, the rules, the specific plays and their code numbers. On the field he can 
isolate and evaluate proficiency in each skill, in blocking, in passing, in 
receiving, in punting. But the language instructor cannot teach or test the 
kr guage skills without employing vocabulary, grammar and syntax. Each 
language skill, however, has certain unique particularitt’es. Let us look at these 
distinctive features and see how they can best be evaluated. 

The four skills may be measured separately or in combination. In a^«;?test 
only one skill is utilized. Tests built around two or three skills, for example 
listening and reading or listening and writing, will be termed hybrid 
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Listening. One problem in aural comprehension is presented by what the 
linguists call “minimal pairs”, words that differ by only one phoneme. An 
English example may clarify thi' concept. Many foreigners learning English 
have trouble distinguishing the vowel sounds in “ham”, “hem” and “hymn”. 
A listening test item would read : 

Choose the logical rejoinder : 

She has got the ham ready. 

1. Well, let’s sit down to eat. 

2. And Mother has offered to sew it for me. 

3. So let’s sing it through to see how it sounds. 

Had the student understood “hem”, the completion mentioning sewing would 
have seemed plausible; had he understood “hymn” he might have imagined a 
choir rehearsal. For other languages one can prepare similar listening items in 
which comprehension hinges on the discrimination of a single sound. Both the 
key sentence and the possible completions or rejoindv^rs would be prerecorded 
on tape. Another version of the same type of item would have only the key 
sentence recorded ; options 1, 2, and 3 would be pictures showing, respectively, 
a woman with a ham, a woman holding the hem of a dress, and an organist 
with a hymnal. It is also possible to record just the key word “ham”, and show 
the student numbered pictures of a ham, a hem, and a hymn. However, a whole 
sentence furnishes a more natural context 

A second problem unique to the listening skill is the comprehension of 
rapid conversation. Perhaps you have had the following experience: Lost in 
Paris, you asked directions of a policeman and found his French response easily 
comprehensible; but when this same person began talking with another 
Frenchman, you were totally unable to follow the conversation. Consider for a 
moment the phrase “jeetjet.” It sounds like nonsense syllables. But were a 
friend tc ask “Jeetjet.?” you might answer: “No, but I plan to get a sandwich 
after this symposium.” To understand longer sentences, the student must 
increase his retention span and learn to pick out key words. Recorded listening 
comprehension tests, presenting a quick dialog followed by clearly enunciated 
questions and multiple-choice responses, can validly measure the student’s 
ability to understand rapid speech. 

Many other types of listening tests exist. Inasmuch as they measure compre- 
hension of a distinctly recorded conversation or passage which is not built 
around minimal sound distinctions, such tests primarily evaluate student 
achievement in structure and vocabulary via the listening skill. Even students 
with poor discrimination and no training in rapid conversation will do well on 
such examinations if they are familiar with the content of the items. 

So far we have disarssed pure listening tests, those in which the entire 
examination is recorded and the student indicates only a letter response on an 
answer sheet In the 7 930’s phonetic accuracy tests were developed in a multiple- 
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choire format. Since the skills of reading and writing were receiving greater 
emphasis in the classroom, the student’s listening discrimination could be 
reliably measured in relation to the printed word. Were such a test given in 
English, the student would read : 



1. He’s sleeping. 

2. He’s slipping. 

3. He’s leaping. 

The recorded voice would state once: “He’s sleeping’’. The student is called 
upon to select the proper phase. In the New Key classroom, where the spoken 
anguage IS presented before the written language, such items at the elementary 
level might tend to become spelling tests in that they measure the relationship 
between the sounds, which are already familiar, and the printed word. 

Some hybrid listening tests currently in use are administered with an answer 
booklet The student hears a recorded conversation or passage and then answers 
either spoken or printed questions by indicating his selection among the sug- 
gested responses he reads in the booklet The student who reads with difficulty 
and IS more at ease with the spoken language will be at a disadvantage in this 
type of “listening” test The first hybrid listening examinations, employed in 

the 1920’s avoided this possible danger by presenting the printed section 
entirely in English. 

Speakmg. The specific elements characteristic of the speaking skill are pronun- 
aauon, intonation and fluency. In 1929, the Modern Foreign Language Study 
report on achievement tests pessimistically stated: “Standardized group tests 
for pronunciation and oral composition which could be administeied widely 
seem almost impossibilities”'). However, four years later experiments were in 
progress using phonographic aluminum disks. In recent years the rapid growth 
of language laboratories has facilitated the administration of identical speaking 
tests to large groups of students. Trained scorers, often working in groups 
have demonstrated the possibility of rating student performance quite objectively.’ 
Only specific aspects of each utterance are scored but the student does not 
know what the examiner will listen for. As early as the 1930’s it was noted that 
reliable scores could be obtained. This was confirmed by an experiment CDnduc- 
ted at the University of Colorado in 1960-1962, by the MLA Cooperative 
Foreign Language Tests, and by the experimental Pimsleur tests. 

Unfortunately, most speaking tests are tedious to score because each student 
tape must be played in its entirety. Since speaking tests are often administered 
with a spoken cue such as a question or command, the scorers must spend 
time listening to the same cues on every student tape. Nelson Brooks has 



0 V. A. C. 
American and 
1929, p. 3. 



Henmon. Ad>ievment Tests the Foreign Languages, Publications of the 
Canadian Committees on Modern Languages, Vol. 5, New York, 
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thought of a system whereby tlie student activates his own tape whenever he 
speaks and thus records only his own voice ^). 

The ability to read aloud has generally been evaluated on the basis of a 
relatively long narrathe or dramati'' ^election. George Scherer has found that 
this type oi oral performance car. ’ reliably graded by means of a passage as 
short as four lines, again reducing the scoring time ’). Another interesting 
experiment is underway at the Education Office of the Supreme Headquarters 
of Allied Powers in Europe (SHAPE) near Paris where pronunciation is being 
measured with the Kay Sonagraph. The student sentence is graphically repro- 
duced so that the relative intensities of frequencies from 0—8000 kilocycles are 
represented as a function of time. On the sonagram the pronunciation of vowels 
and consonants can be evaluated visually. A different setting on the machine 
yields a graphic picture of the intonation. Whereas all voices give a readable 
intonation curve, it is unfortunately often difficult to read the pronunciation 
sonagrams of women’s voices. 

In a pure speaking test the student is asked to ulk about a suggested topic, to 
describe a picture, or to give directions according to a map or diagram. Such 
tests, scored on fluenQr and overall quality, are generally administered as the 
final section of a longer examination. 

If students are to be evaluated on their pronunciation of a certain sound or 
on the intonation pattern they give a particular phrase, then all must utter the 
same sentence. It appears almost im[ ssiblr to utilize a pure speaking test to 
elicit such a response. Consequently other means are employed : the student 
recites or records a memorized passage or poem; the student repeats a sentence 
he hears on the tape; the student answers a specific prerecorded question 
according to a model response; the student reads a printed passage or sentence. 
In the audio-lingual curriculum, spoken cues or directions seem preferable at 
the elementary levels. Advanced students more familiar with the printed word 
may record sentences read from a test booklet At this point in their training 
they are less likely to allow the written forms to interfere with their pronuncia- 
tion. Tests in which students read aloud have the advantage of cutting the 
scoring time in half since the judges need not listen to spoken cues. 

Reading. The reading skill is characterized by speed and the recognition of 
structure in long or complex sentences. Reading comprehension, in this sense, 
is similar to reading comprehension in English, an area in which not all Ameri- 
can students attain equal proficiency. The speed with which a student reads a 
foreign language can be measured by timing student performance on a reading 
comprehension test, or, more objectively, by administering a long test which the 



New developments in modern languages testing. Educational Horizons, Fall 
1964, 21-25. 

^ George Scherer and Michael Wfertheirner. A psycholinguistic eeperiment in foreign- 
language teaching. New York, 1965, p. 122. 
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students arc unable to finish within a given period of time. Comprehension, in 

relation to speed, is evaluated by multiple-choice questions based on the text or 
passage. 

\Wicn it comes to reading, the advocates of the New Key insist on no more 
decoding . If the student is really to m*/thc foreign language without mentally 
translating, new words cannot be introduced too frequently at the early levels. 

and^l^he'^M^^^ used standardized tests in foreign languages, the College Boards 
these d T Regents, arc printed objective tests. Many of the items on 

usa..e^ r "=°"™P*-ehcnsion, but directly measure grammatical 

usage ^nd vocao.ilary A surprisingly large pcrcenUgc of items on the ML A 
Coopenuve Teste end the experimental Pimslcur Reading Comprehension 
teste also hinge directly on vocabulary. This factoi might account for some of 
the coolness which certain teachcr.c have shown the New Key materials. In our 
tcs^oricnted soacty, many teachers find themselves judged by their students’ 
performance on such reading comprehension tests. Moreover, many teachers, 
eager to have their second- and third-year students do well on the College Boards 
have been supplementing the new audio-lingual materials with outside reading 
m order to increase students’ vocabulaiy. The teachers also realize that most 
present standardized listening examinations have written options and that 

consequently unless the student knows how to read he won’t pass a listening 
comprehension test. ® 

“f ’"'•'-e spelling and slylc. Since free or 
^ J 7k '”k objectively, questions of style must be 

judged by the individual qualified teacher. Spelling tests, inthefoim of isolated 

words or paragraph dictation, depend on a spoken stimulus and are generally 
limited in use to the classroom. 

Forty years ago almost all extramural language tests were written tests. The 
advent of the standardized language examinations brought with it the introduc- 
tion of completion items. Soon the recognized ease and economy of mechanical 

scoring relegated writing tests to the classroom. Currently the MI.A Coop teste and 

the experimental Rmsleur tests are reintroducing writing samples, which genev- 
ally evaluate the student’s active knowledge or recall of foreign language usage 
and structure. Since objectivity requires that a single specific response be elicited 
such tests may assume the form of fill-in-the-blank passages or sentence trans- 
formation exercises. The authors of New Key materials, trying to avoid transla- 
tion by elementary and intermediate students, have popularized two new types 
of writing teste. The first type is the “dehydrated” sentence. In English, for 
^ample, the student would read: ” Joan-go-school-yesterday”. In the second 

such as Joan went to school yesterday” followed by a series of words “girls- 

ome-rehearsd-last-week”. The new sentence would read: “The girls came to 
the rehearsal last week.” ® 
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Having examined various methods of evaluating tne four language skills, let 
us turn to areas which merit future research. 

With the publication of the new MLA Cooperative Foreign Ltnguage Tesis, 
standardized examinations are now available in all four language skills. The 
norms esrablished for these separate tests allow the teacher to appraise each 
student’s standing relative to a nationwide sample. Wiat remains to be done.^ 

• . Comparable tests in the four skills should be developed. Relative student 
proficienty in the language can only be validly ascertained if equivalent tests in 
the four skills are created, each test utilizing similar structures and vocabulary. 
Sucdi tests will permit a more valid evaluation of the relative merits of various 
teaching techniques. 

2. Pure listening comprehension tests should be standardized, and thei. 
results nationally compared to those of pure reading tests. 

3. Another research project might profitably investigate the precise rela- 
tionship between the comprehension and rhe production skills, between recogni- 
tion and recall. There ic a positive correlation between the presently- used listen- 
ing and speaking tests. Equivalent forms of listening discrimination and pro- 
nunciation tests should oe devised to establish how high the correlation between 
hearing and speaking really is under ihe following conditions. 

(a) for students who believe that only the listening test score will determine 

their grade, 

(b) for students who believe that only the speaking test score will determine 
their grade, 

'c) for students who believe that both the speaking and the listening test scores 
will determine their grade. 

Equivalent writing and reading tests could be developed to define with more 
pred.i!on the relationship between recall and recognition in tests of usage and 
grammatical structure. This battery would similarly be administered to three 
groups of students who respectively have prepared for a reading test only, a 
writing test only, and both tests together. A very high correlation in a given 
category v/ould permit the indirect measurement of student recall through the 
use of recognition items which permit objective testing, high reliability and 
economy in scoring. 

4. Studies for each foreign language could determine the applicability of 
what Robert Lado terms the “partial production’’ technique. In a partial produc- 
tion test, printed items are employed to evaluate ths recall skills of speaking and 
writing. Lado measured pronunciation, stress and intonation in English by 
having the students indicate similarities between printec' words or sentences. For 
example, which of the following does not dearly rhyme with the others : “food” 
“good” “stood” “wood”.’ Such an item is effective in English because of the 
irregular fit between pronundation and spelling^). 



*) Language Testing. New York, 1964, 95—104. 
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5. The validity of the dictation in measuring the student’s command of the 
language should be investigated. Various methods of scoring dictations should 
be compared. 

Achievement testing in foreign languages is an open field. Recent advances 
have been notable, but in the coming years we can look forward to a variety of 
new developments in language testing. 

Rebecca M. Valette 
Boston College 
Boston, Mass. 
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